Differential Training of 1 Rollout Policies
نویسنده
چکیده
We consider the approximate solution of stochastic optimal control problems using a neurodynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popular methods of computing rollout policies are particularly sensitive to simulation and approximation error, and we present more robust alternatives, which aim to estimate relative rather than absolute Q-factor and cost-to-go values. In particular, we propose a method, called differential training , that can be used to obtain an approximation to cost-to-go differences rather than cost-to-go values by using standard methods such as TD(λ) and λ-policy iteration. This method is suitable for recursively generating rollout policies in the context of simulation-based policy iteration methods.
منابع مشابه
Rollout Policies for Dynamic Solutions to the Multivehicle Routing Problem with Stochastic Demand and Duration Limits
We develop a family of rollout policies based on fixed routes to obtain dynamic solutions to the vehicle routing problem with stochastic demand and duration limits (VRPSDL). In addition to a traditional one-step rollout policy, we leverage the notions of the preand post-decision state to distinguish two additional rollout variants. We tailor our rollout policies by developing a dynamic decompos...
متن کاملAverage-Case Performance of Rollout Algorithms for Knapsack Problems
Rollout algorithms have demonstrated excellent performance on a variety of dynamic and discrete optimization problems. Interpreted as an approximate dynamic programming algorithm, a rollout algorithm estimates the value-to-go at each decision stage by simulating future events while following a heuristic policy, referred to as the base policy. While in many cases rollout algorithms are guarantee...
متن کاملParallel Rollout for Online Solution of Partially Observable Markov Decision Processes
We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies avai...
متن کاملRestocking-Based Rollout Policies for the Vehicle Routing Problem with Stochastic Demand and Duration Limits
We develop restocking-based rollout policies to make real-time, dynamic routing decisions for the vehicle routing problem with stochastic demand and duration limits. Leveraging dominance results, we develop a computationally tractable method to estimate the value of an optimal restocking policy along a fixed route. Embedding our procedure in rollout algorithms, we show restocking-based rollout ...
متن کاملSolution methodologies for vehicle routing problems with stochastic demand
We present solution methodologies for vehicle routing problems (VRPs) with stochastic demand, with a specific focus on the vehicle routing problem with stochastic demand (VRPSD) and the vehicle routing problem with stochastic demand and duration limits (VRPSDL). The VRPSD and the VRPSDL are fundamental problems underlying many operational challenges in the fields of logistics and supply chain m...
متن کامل